You can run and edit these examples interactively on Galaxy
Search for MGnify Studies or Samples, using MGnifyR
The MGnify API returns data and relationships as JSON. MGnifyR is a package to help you read MGnify data into your R analyses.
This example shows you how to perform a search of MGnify Studies or Samples
You can find all of the other “API endpoints” using the Browsable API interface in your web browser. This interface also lets you inspect the kinds of Filters that can be created for each list.
This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.
MGnifyR is an R package that provides a convenient way for R users to access data from the MGnify API.
Detailed help for each function is available in R using the standard ?function_name command (i.e. typing ?mgnify_query will bring up built-in help for the mgnify_query command).
A vignette is available containing a reasonably verbose overview of the main functionality. This can be read either within R with the vignette("MGnifyR") command, or in the development repository
MGnifyR Command cheat sheet
The following list of key functions should give a starting point for finding relevent documentation.
mgnify_client() : Create the client object required for all other functions.
mgnify_query() : Search the whole MGnify database.
mgnify_analyses_from_xxx() : Convert xxx accessions to analyses accessions. xxx is either samples or studies.
mgnify_get_analyses_metadata() : Retrieve all study, sample and analysis metadata for given analyses.
mgnify_get_analyses_phyloseq() : Convert abundance, taxonomic, and sample metadata into a single phyloseq object.
mgnify_get_analyses_results() : Get functional annotation results for a set of analyses.
mgnify_download() : Download raw results files from MGnify.
mgnify_retrieve_json() : Low level API access helper function.
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: permute
Loading required package: lattice
This is vegan 2.6-4
In these examples we set maxhits=1 to retrieve only the first page of results. You can change the limit or set it to -1 to retrieve all samples matching the query.
The Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set PRJNA230567, and was assembled with metaspades v3.15.3. This project includes samples from the following biomes: root:Engineered:Wastewater.
EMG produced TPA metagenomics assembly of PRJNA230567 data set (Systems Biology of Lipid Accumulating Organisms).
SUBMITTED
studies
studies
MGYS00006558
MGYS00006558
89
PRJNA230567
FALSE
2023-12-19T12:35:08
SRP033648
Luxembourg Centre for Systems Biomedicine
Characterization of microbial communities at the genomic, transcriptomic, proteomic and metabolomic levels, with a special interest on lipid accumulating bacterial populations, which are naturally enriched in biological wastewater treatment systems and may be harnessed for the conversion of mixed lipid substrates (wastewater) into biodiesel. The project aims to elucidate the genetic blueprints and the functional relevance of specific populations within the community. It focuses on within-population genetic and functional heterogeneity, trying to understand how fine-scale variations contribute to differing lipid accumulating phenotypes. Insights from this project will contribute to the understanding the functioning of microbial ecosystems, and improve optimization and modeling strategies for current and future biological wastewater treatment processes. This BioProject contains datasets derived from the same biological wastewater treatment plant. The date includes metagenomes, metatranscriptomes and organisms isolated in pure cultures.
Systems Biology of Lipid Accumulating Organisms
HARVESTED
studies
studies
MGYS00005985
MGYS00005985
1
PRJEB45225
FALSE
2022-03-11T21:49:39
ERP129301
EMG
The Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set PRJNA593593, and was assembled with metaSPAdes v3.15.2. This project includes samples from the following biomes: root:Engineered:Wastewater.
EMG produced TPA metagenomics assembly of PRJNA593593 data set (Sewage microbial communities from Oakland, California, United States - Biofuel Metagenome 10).
SUBMITTED
studies
studies
MGYS00005997
MGYS00005997
1
PRJEB45727
FALSE
2022-03-11T21:12:02
ERP129875
EMG
The Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set PRJNA593594, and was assembled with metaSPAdes v3.15.2. This project includes samples from the following biomes: root:Engineered:Wastewater.
EMG produced TPA metagenomics assembly of PRJNA593594 data set (Sewage microbial communities from Oakland, California, United States - Biofuel Metagenome 11).
Sewage microbial communities from Oakland, California, United States - Biofuel Metagenome 10
HARVESTED
studies
studies
MGYS00002316
MGYS00002316
1
PRJEB24109
FALSE
2022-02-03T15:58:54
ERP105914
EMBL-EBI
The activated sludge metagenome Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set: PRJNA340752. This project includes samples from the following biomes: Engineered, Wastewater, Activated Sludge.
EMG produced TPA metagenomics assembly of the Active sludge microbial communities of municipal wastewater-treating anaerobic digesters from China - AD_SCU002_MetaG metagenome (activated sludge metagenome) data set.
To find metadata_keys and values, it is best to browse the interactive API Browser, and use the Filters button to construct queries interactively at first.
To find metadata_keys and values, it is best to browse the interactive API Browser, and use the Filters button to construct queries interactively at first.
Example: adding additional filters to the data frame
First, fetch some samples from the Lentic biome. We can specify the entire Biome lineage, too.
Now, also filter by depth within the returned results, using normal R syntax.
depth_numeric =as.numeric(lentic_samples$depth) # We must convert data from MGnifyR (always strings) to numerical format.depth_numeric[is.na(depth_numeric)] =0.0# If depth data is missing, assume it is surface-level.lentic_subset = lentic_samples[depth_numeric >=25& depth_numeric <=50,] # Filter to samples collected between 25m and 50m down.lentic_subset